24 research outputs found
Predicting wikipedia infobox type information using word embeddings on categories
Wikipedia has emerged as the largest multilingual, web based general reference work on the Internet. A huge amount of human resources have been invested in the creation and update of Wikipedia articles which are ideally complemented by so-called infobox templates defining the type of the underlying article. It has been observed that the Wikipedia infobox type information is often incomplete and inconsistent due to various reasons. However, the Wikipedia infobox type information plays a fundamental role for the RDF type information of Wikipedia based Knowledge Graphs such as DBpedia. This stimulates the need of always having the correct and complete infobox type information. In this work, we propose an approach to predict Wikipedia infobox types by using word embeddings on categories of Wikipedia articles, and analyze the impact of using minimal information from the Wikipedia articles in the prediction process
DORIS: Discovering Ontological Relations In Services
We propose to demonstrate DORIS, a system that maps the schema of a Web Service automatically to the schema of a knowledge base. Given only the input type and the URL of the Web Service, DORIS executes a few probing calls, and deduces an intensional description of the Web service. In addition, she computes an XSLT transformation function that can transform a Web Service call result in XML to RDF facts in the target schema. Users will be able to play with DORIS, and to see how real-world Web Services can be mapped to large knowledge bases of the Semantic Web
Temporal Role Annotation for Named Entities
Natural language understanding tasks are key to extracting structured and semantic information from text. One of the most challenging problems in natural language is ambiguity and resolving such ambiguity based on context including temporal information. This paper, focuses on the task of extracting temporal roles from text, e.g. CEO of an organization or head of a state. A temporal role has a domain, which may resolve to different entities depending on the context and especially on temporal information, e.g. CEO of Microsoft in 2000. We focus on the temporal role extraction, as a precursor for temporal role disambiguation. We propose a structured prediction approach based on Conditional Random Fields (CRF) to annotate temporal roles in text and rely on a rich feature set, which extracts syntactic and semantic information from text.
We perform an extensive evaluation of our approach based on two datasets. In the first dataset, we extract nearly 400k instances from Wikipedia through distant supervision, whereas in the second dataset, a manually curated ground-truth consisting of 200 instances is extracted from a sample of The New York Times (NYT) articles. Last, the proposed approach is compared against baselines where significant improvements are shown for both datasets
Leveraging Mathematical Subject Information to Enhance Bibliometric Data
The field of mathematics is known to be especially challenging from a bibliometric point of view. Its bibliographic metrics are especially sensitive to distortions and are heavily influenced by the subject and its popularity. Therefore, quantitative methods are prone to misrepresentations, and need to take subject information into account. In this paper we investigate how the mathematical bibliography of the abstracting and reviewing service Zentralblatt MATH (zbMATH) could further benefit from the inclusion of mathematical subject information MSC2010. Furthermore, the mappings of MSC2010 to Linked Open Data resources have been upgraded and extended to also benefit from semantic information provided by DBpedia
SOFYA: Semantic on-the-fly Relation Alignment
Recent years have seen the rise of Web data, in particular Linked Data, with, up to now, more than 1000 datasets in the Linked Open Data Cloud (LOD). These datasets are mostly of entity-centric nature and are highly heterogeneous in terms of domains, language, schema, etc. Hence, the vision of uniformly querying such resources in the LOD has a long way to go. While equivalent entity instances across datasets are often linked by sameAs links, relations from different datasets and schemas are usually not aligned. In this paper, we propose an on-line instance-based relation alignment approach. The alignment may be performed during query execution and requires partial information from the datasets. We align relations to a target dataset using association rule mining approaches. We sample for equivalent entity instances with two main sampling strategies. Preliminary experiments, show that we are able to align relations with high accuracy, even if accessing the entire datasets is impossible or impractical
TableNet: An approach for determining fine-grained relations for wikipedia tables
We focus on the problem of interlinking Wikipedia tables with fine-grained table relations: equivalent and subPartOf. Such relations allow us to harness semantically related information by accessing related tables or facts therein. Determining the type of a relation is not trivial. Relations are dependent on the schemas, the cell-values, and the semantic overlap of the cell values in tables. We propose TableNet, an approach for interlinking tables with subPartOf and equivalent relations. TableNet consists of two main steps: (i) for any source table we provide an efficient algorithm to find candidate related tables with high coverage, and (ii) a neural based approach that based on the table schemas and data, determines with high accuracy the fine-grained relation. Based on an extensive evaluation with more than 3.2M tables, we show that TableNet retains more than 88% of relevant tables pairs, and assigns table relations with an accuracy of 90%
TECNE: Knowledge based text classification using network embeddings
Text classification is an important and challenging task due to its application in various domains such as document organization and news filtering. Several supervised learning approaches have been proposed for text classification. However, most of them require a significant amount of training data. Manually labeling such data can be very time-consuming and costly. To overcome the problem of labeled data, we demonstrate TECNE, a knowledge-based text classification method using network embeddings. The proposed system does not require any labeled training data to classify an arbitrary text. Instead, it relies on the semantic similarity between entities appearing in a given text and a set of predefined categories to determine a category which the given document belongs to
Machine Learning gegen Schwerhƶrigkeit : Vorhersage des Erfolgs bei Cochlea-Implantat-Versorgung
Sogenannte Cochlea-Implantate sind unter Menschen mit Schwerhƶrigkeit noch nicht sehr weit verbreitet, unter anderem, weil sich das AusmaĆ des Sprachverstehens mit dem Implantat vor der Operation schwer einschƤtzen lƤsst. Wissenschaftlerinnen und Wissenschaftler der Medizinischen Hochschule Hannover (MHH), der Technischen UniversitƤt Braunschweig und des Forschungszentrums L3S wollen in einem von der VW-Stiftung gefƶrderten Projekt Patientendaten auswerten, um den Erfolg von Cochlea-Implantaten besser bestimmen zu kƶnnen
Approaches Towards Unified Models for Integrating Web Knowledge Bases
Ma theĢse a comme but lāinteĢgration automatique de nouveaux services Web dans une base de connaissances. Pour chaque meĢthode dāun service Web, une vue est calculeĢe de maniĆØre automatique. La vue est repreĢsenteĢe comme une requeĢte sur la base de connaissances. Lāalgorithme que nous avons proposĆ© calcule eĢgalement une fonction de transformation XSLT associeĢe Ć la meĢthode qui est capable de transformer les reĢsultats dāappel dans un fragment conforme au scheĢma de la base de connaissances. La nouveauteĢ de notre approche cāest que lāalignement repose seulement sur lāalignement des instances. Il ne deĢpend pas des noms des concepts ni des contraintes qui sont deĢfinis par le scheĢma. Ceci le fait particulieĢrement pertinent pour les services Web qui sont publieĢs actuellement sur le Web, parce que ces services utilisent le protocole REST. Ce protocole ne permet pas la publication de scheĢmas. En plus, JSON semble sāimposer comme le standard pour la repreĢsentation des reĢsultats dāappels de services. Ć diffeĢrence du langage XML, JSON nāutilise pas de noeuds nommeĢs. Donc les algorithmes dāalignement traditionnels sont priveĢs de noms de concepts sur lesquels ils se basent.My thesis aim the automatic integration of new Web services in a knowledge base. For each method of a Web service, a view is automatically calculated. The view is represented as a query on the knowledge base. Our algorithm also calculates an XSLT transformation function associated to the method that is able to transform the call results in a fragment according to the schema of the knowledge base. The novelty of our approach is that the alignment is based only on the instances. It does not depend on the names of the concepts or constraints that are defined by the schema. This makes it particularly relevant for Web services that are currently available on the Web, because these services use the REST protocol. This protocol does not allow the publication schemes. In addition, JSON seems to establish itself as the standard for the representation of technology call results